Real-Time Video Frame Pre-Processing Hardware

ABSTRACT

A dynamically reconfigurable heterogeneous systolic array is configured to process a first image frame, and to generate image processing primatives from the image frame, and to store the primatives and the corresponding image frame in a memory store. A characteristic of the image frame is determined. Based on the characteristic, the array is reconfigured to process a following image frame.

BENEFIT CLAIM

This application claims the benefit under 35 U.S.C. §120 as acontinuation of application Ser. No. 12/959,281, filed Dec. 2, 2010,which claims the benefit under 35 U.S.C. 119(e) of provisionalapplication 61/362,247, filed Jul. 7, 2010, the entire contents of whichare hereby incorporated by reference for all purposes as if fully setforth herein. The applicants hereby rescind any disclaimer of claimscope in the parent applications or the prosecution history thereof andadvise the USPTO that the claims in this application may be broader thanany claim in the parent applications.

BACKGROUND

After images are acquired by an image sensor within a digital imagingsystem, the images are typically processed before display or storage onthe device. A typical image processing chain or image processingpipeline, or IPP, is illustrated in FIG. 1. The example IPP shown inFIG. 1 includes an exposure and white balance module 2, a demosaic block4, a color correction block 6, a gamma correction block 8, a colorconversion block 10 and a downsampling module 12.

When it is desired to implement a real-time video imaging system, thereare often significant constraints with such IPP, because image data istypically read from memory on each stage of the IPP and then writtenback after some operations. For HD video, the memory bandwidthexperiences significant challenges. Thus, it is desired to implementelements of the IPP directly in hardware embodiments in videoacquisition devices. This would have the advantage that elements of theIPP avoid the challenge of writing image data to memory after each stageof processing, and reading back the data for each subsequent IPPoperation. However, it implies that the methods applied at each stage ofthe IPP could be less adaptable, as the entire IPP chain would beconfigured prior to inputting data from a single image frame.

Modern digital still cameras (DSC) implement more sophisticated imageand scene analysis than can be provided by a basic IPP as illustratedwith some example blocks at FIG. 1. In particular, image acquisitiondevices can detect and track face regions within an image scene (seeU.S. Pat. Nos. 7,620,218, 7,460,695, 7,403,643, 7,466,866 and 7,315,631,and US published. application nos. 2009/0263022, 201010026833,2008/0013798, 2009/0080713, 2009/0196466 and 2009/0303342 and U.S. Ser.No. 12/374,040 and Ser. No. 12/572,930, which are all assigned to thesame assignee and hereby incorporated by reference), and these devicescan analyze and detect blemishes and imperfections within such regionsand correct such flaws on the fly (see the above and U.S. Pat. No.7,565,030 and US published application no. 2009/0179998, incorporated byreference). Global imperfections such as dust blemishes or “pixies” canbe detected and corrected (see, e.g., U.S. Ser. No. 12/710,271 and Ser.No. 12/558,227, and U.S. Pat. Nos. 7,206,461, 7,702,236, 7,295,233 and7,551,800, which are all assigned to the same assignee and incorporatedby reference). Facial enhancement can be applied. Image blur and imagemotion, translational and rotational, can be determined and compensated(see, e.g., 7,660,478 and US published applications nos. 2009/0303343,200710296833, 2008/0309769, 2008/0231713 and 200710269108 andW0/20081131438, which are all incorporated by reference). Facial regionscan be recognized and associated with known persons (see, e.g., U.S.Pat. Nos. 7,567,068, 7,515,740 and 7,715,597 and US201010066822,US2008/0219517 and US2009/0238419 and U.S. Ser. No. 12/437,464, whichare all incorporated by reference). All of these techniques and others(see, e.g., U.S. Pat. Nos. 6,407,777, 7,587,085, 7,599,577, 7,469,071,7,336,821, 7,606,417 and2009/0273685,200710201725,2008/0292193,2008/0175481, 2008/0309770,2009/0167893, 2009/0080796, 2009/0189998, 2009/0189997, 2009/0185753,2009/0244296, 2009/0190803, 2009/0179999 and U.S. Ser. No. 12/636,647,which are assigned to the same assignee and hereby incorporated by‘reference) rely on an analysis of an image scene. Typically, thisinvolves the reading of blocks of image data from a memory storefollowed by various processing stages of this data. Intermediate datastructures may be stored temporarily within the image store tofacilitate each scene analysis algorithm. In some cases, these data arespecific to a single algorithm, while in others, data structures maypersist across several different scene analysis algorithms. In thesecases, image data is moved between image store memory and a CPU toperform various image processing operations. Where multiple algorithmsare applied, image data is typically read several times to performdifferent image and scene processing operations on each image.

For most of the above techniques, analysis may involve a preview imagestream which is a stream of relatively low resolution captured by mostdigital cameras and used to provide a real-time display on the cameradisplay. Thus, in order to properly analyze the main image scene, it isuseful to have at least two images of substantially the same sceneavailable. Where one or more preview images are also stored, these arealso typically read on multiple occasions in combination with the mainacquired (full resolution) image. In addition, processing may involvetemporarily storing upsampled copies of preview images or downsampledcopies of main acquired images to facilitate various scene analysisalgorithms.

Within a digital camera, images are typically acquired individually anda substantial time interval, typically of the order of a second or more,is available between image acquisitions for scene analysis and postprocessing of individual images. Even where multiple images are acquiredin close temporal proximity, e.g., in a burst mode of a professionalDSC, a finite number of images may be acquired due to limited memory.Furthermore, these images cannot be processed during the burstacquisition, but often wait until it is completed before moresophisticated scene-based processing can be implemented.

Within a modern video appliance, data is often processed at frame ratesof 30 fps or more, and due to memory constraints, the data is digitallycompressed and written to a long-term memory store more or lessimmediately. Furthermore, a low-resolution preview stream is notgenerally available as in the case of a DSC. Finally, the requirementsof handling a full-HD video stream imply that memory bandwidth ischallenging within such an appliance.

In order to achieve the benefits of modern scene analysis techniquessuch as are presently available within a DSC for a HD video acquisitiondevice we can thus identify several key challenges. Firstly, it isdifficult to store and perform complex scene analysis on a full HDwithin the time available between video frame acquisitions. This is notsimply a matter of CPU power, but perhaps more importantly a matter ofdata bandwidth. The size of full HD images implies that it is verychallenging simply to move such images through an IPP and into a videocompression unit onto long-term storage. While some limited sceneanalysis may be possible through hardware additions to the IPP, thiswould likely involve many settings and configurations that are fixedprior to beginning real-time acquisition of the video stream, such thatthey would not be dynamically adaptable and responsive to ongoing sceneanalysis.

Secondly, there is no scope to share image processing data primativesbetween scene analysis algorithms without introducing very large sharedmemory buffers into the IPP. This would lead to hardware designrequirements which are unreasonable and effectively mimic the existingstate-of-art, illustrated in FIG. 2, within a single IC. FIG. 2illustrates conventional hardware to implement an IPP and other highlevel functions in software. A memory 14 is shown that includes an imageand data cache 16 as well as a long term data store 18. The cache 16 canstore raw data 20, RGB formatted data 22 and RGB processed data 24,while the long term data store 18 may hold MPEG images 26 and/or JPEGimages 28. A sensor 32 communicates raw data to the memory 14 and to theIPP 34. The IPP 34 also receives data from the memory 14. The IPP 34provides RGB data 22 to the memory 14,16. RGB data 22,24 is alsoretrieved by the CPU 36 which provides processed RGb data 24 to thememory 14 and RGB data 22 to a transcode module 38. The trasncode module38 provides data to and retrieves data from the memory 14,18. Thetranscode module also provides data to be shown on, e.g., a LCDITFTdisplay 40.

For various practical reasons, this does not provide an optimal imageprocessing mechanism. An alternative is to have separate hardwareimplementations for each scene analysis algorithm, but this will alsolead to very large hardware sizes as each algorithm would use buffer afull image frame in order to perform full scene analysis.

There are many additional engineering subtleties within each of thesebroad areas, but it is possible to identify a broadly scoped challengewherein current scene analysis techniques and resulting imageenhancement benefits are not sensibly applied to real time video usingcurrent state-of-art techniques. An advantageous set of embodiments aretherefore provided below.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 illustrates a conventional image processing pipeline (IPP).

FIG. 2 illustrate conventional hardware to implement an IPP and otherhigh level functions in software.

FIG. 3 illustrates a homogeneous systolic array architecture.

FIG. 4 illustrates IPP hardware with advanced hardware for imageprocessing, or AHIP, including a pass-through characteristic, inaccordance with certain embodiments.

FIG. 5 illustrates an AHIP module including several generic processingmodules arranged into processing chains for various image processingprimatives, in accordance with certain embodiments.

FIG. 6 illustrates an inter-relationship between a main CPU, hardwaremodule, an image sensor and SDRAM read/write channels in accordance withcertain embodiments.

FIG. 7 illustrates interrelationships between a memory store includingan image and data cache with an AHIP module in accordance with certainembodiments.

FIG. 8 illustrates a color thresholding module in accordance withcertain embodiments.

FIG. 9 illustrates an original image and skin-color map with 16thresholds (4 bit) in accordance with certain embodiments.

FIG. 10 illustrates an AHIP module in accordance with certainembodiments that outputs color space conversion, color thresholding,frame-to-frame registration, Integral Image and/or Integral SquaredImage primatives and/or image frame histogram data.

FIG. 11 illustrates processing relationships between an AHIP module andother image processing components in accordance with certainembodiments.

DETAILED DESCRIPTIONS OF THE EMBODIMENTS

Embodiments are described below that include a hardware sub-system whichgenerates a range of image processing primatives derived in real-timefrom an image frame which is input serially, pixel-by-pixel with a delaywhich is significantly less than that associated with acquiring a fullimage frame. These primatives are available very soon or even almostimmediately after an image frame has been acquired and can be used tofurther process this image frame when the next image frame is beingacquired.

In addition, data determined from the processing of a previous imageframe in a sequence can be made available for combination with imageprocessing primatives derived from the present image frame. This enablesdetailed frame by frame scene processing without separately capturing alow-resolution preview stream of images (although such may be optionallyutilized in certain embodiments).

Embodiments are also described that operate using a one pixel per clockcycle input and/or that generate a number of different types of imageprocessing primatives which provide useful knowledge about the currentimage/video frame. Each primative is generated by a processing chainwhich comprises one or more pixel processing blocks, or modules. Theseare linked together by a plurality of internal data busses which may bedynamically switched. In certain less complex embodiments modules mayhave directly linked data paths, although the primary input modules mayshare a common input data path from the image sensor/IPP. Multiplemodules may share the same input data. Further, the output of individualprocessing blocks may be combined logically. The individual outputs frommultiple processing chains are typically combined into a single dataword before being output to external memory (SDRAM) as this facilitatesoptimal use of memory and external memory busses. Because of thedifferences in processing time between processing chains, asynchronization module is integrated with logical circuitry to ensurecorrect alignment of the output data.

The generated image primatives can advantageously be used to acceleratethe performance of a range of image processing operations includingred-eye detection, face detection and recognition, face beautification,frame-to-frame image registration, and multi-frame image joining forcreation of panorama images, among many more applicable image processingtechniques. Further, the availability of these primatives greatlysimplifies the implementation of a range of scene analysis andprocessing algorithms. This can advantageously reduce, in particular,the tendency to read and write the full image frame from the memorystore for subsequent processing on a CPU or GPU. In most cases therelevant image primative(s) and the main image are only read once inorder to analyze and/or enhance the image with a particular algorithm.It is also possible to load primatives from multiple algorithms togetherwith a single read of the main acquired image in order to execute thesemultiple algorithms on a single image read. This greatly reduces thememory bandwidth utilized to process a video stream. Where separateread/write buses are available, it is also possible to process one imageframe on a main CPU/GPU while a second image frames is being acquiredand pre-processed by IPP and AHIP modules.

Further, this system configuration enables data derived from theanalysis of an image frame being processed by the CPU/GPU to be fed backto the IPP or the AHIP module to adapt the pre-processing of a followingimage frame. This detailed adaption of both the global image processingapplied by the IPP and the scene-specific image processing applied bythe AHIP enables faster and more responsive performance of a videoacquisition appliance. This, in turn, allows faster adaptation of videoacquisition in situations where lighting conditions are changing, e.g.,based on an analysis of face regions and associated color maps of theskin Such techniques are advantageously now applicable to videoacquisition in accordance with certain embodiments.

In this regard, a frame counter and associated logic may also beutilized in certain embodiments. At the end of each frame processingcycle, it is possible to reconfigure internal pixel processing chains.This may involve loading new LUTs, changing the processing parameters ofindividual pixel processing blocks or in some cases reconfiguring theorder or logical combination of blocks in a processing chain. In certainembodiments, modules are either selected or bypassed. In moresophisticated embodiments, data processing modules share an I/O port onone or more internal data-busses. In certain embodiments,double-buffered I/O may be employed to enable near simultaneousread/write operations to/from a module.

In accordance with certain embodiments, a dynamically reconfigurableheterogeneous systolic array is configured to process a first imageframe, and to generate image processing primatives from the image frame,and to store the primatives and the corresponding image frame in amemory store. Based on a determined characteristic of the image frame,the dynamically reconfigurable heterogeneous systolic array isreconfigurable to process a following image frame.

The determining of the characteristic of the image frame may beperformed external to the systolic array. The determining of the atleast one characteristic may be based on processing at least a portionof the image frame and on at least one of the image processingprimatives.

An image acquisition and processing device in accordance with certainembodiments includes a processor, a lens and image sensor for acquiringdigital image frames, and any of the dynamically reconfigurableheterogeneous systolic arrays configured as described in accordance withany of the embodiments herein.

A method of image processing using a dynamically reconfigurableheterogeneous systolic array in accordance with certain embodimentsincludes acquiring and processing a first image frame. Image processingprimatives are generated from the image frame. The primatives and thecorresponding image frame are stored in a memory store. At least onecharacteristic of the image frame is determined, and based on the atleast one characteristic, the method includes reconfiguring the array toprocess a following image frame.

The determining of the at least one characteristic may be performedexternal to the systolic array. The determining of the at least onecharacteristic may be based on processing at least a portion of theimage frame and at least one of the image processing primatives.

A further digital image acquisition and processing device is provided inaccordance with certain embodiments. The device includes a lens andimage sensor for acquiring digital images. An image processing pipeline(1PP) module of the device is configured to receive raw image data of animage frame and distribute corresponding formatted image data to amemory and a hardware module configured to perform image processing. Thehardware module receives the formatted image data of the image framefrom the IPP module and provide scene processing primatives to thememory. A processing unit is configured to process a following imageframe based on the scene processing primatives received from the memoryand provided by the hardware module.

The scene processing primatives may include processed image maps orother processed hardware module data, or both. The hardware module mayalso be configured to provide formatted image data to the memory. Theprocessed image maps may include threshold maps or integral image maps,or both. The scene processing primatives may include regional primativesor frame data primatives, or both. The processing unit may also receiveformatted image data from the memory.

The device may include a transcode module configured to receiveprocessed image data from the processing unit and to provide to orreceive from the memory compressed image or video data, or both. Thetranscode module may be configured to output selected subsets of theimage data or video data, or both, to a display.

The hardware module may be configured to process image data within asingle clock cycle. The processing unit may be further configured toprogram the hardware module or the IPP module, or both.

Another method of acquiring and processing digital images is provided inaccordance with certain embodiments including acquiring digital images,and receiving raw image data of an image frame and distributingcorresponding formatted image data to a memory and a hardware basedimage processing module. The method also includes receiving at thehardware-based image processing module the formatted image data of theimage frame and providing scene processing primatives to a memory, andstoring in the memory the scene processing primatives provided by thehardware-based image processing module. A following image frame isprocessed based on the scene processing primatives received from thememory and provided by the hardware-based image processing module.

The scene processing primatives may include processed image maps orother processed hardware module data, or both. Formatted image data maybe provided to the memory from the hardware-based image processingmodule. Image maps including threshold maps or integral image maps, orboth, may be processed. The scene processing primatives may includeregional primatives or frame data primatives, or both. Formatted imagedata may be received from the memory. Processed image data may betransmitted to a transcode module and compressed image or video data maybe communicated to or from the transcode module, or combinationsthereof. Selected subsets of the image data or video data, or both, maybe output to a display. Image data may be processed within a singleclock cycle with the hardware-based image processing module. Thehardware-based image processing module may be programmable and/orreconfigurable.

A hardware-based image processing module if further provided foroperating within an image acquisition and processing device thatacquires raw image data and distributes corresponding formatted imagedata to the hardware-based image processing module which is configuredto generate and provide scene processing primatives to a memory based onthe formatted image data to be stored in the memory for processing afollowing image frame based on the scene processing primatives receivedfrom the memory and provided by the hardware-based image processingmodule.

The scene processing primatives may include processed image maps orother processed hardware module data, or both. The module may beconfigured to provide formatted image data to the memory and/or toprocess image maps including threshold maps or integral image maps, orboth. The scene processing primatives may include regional primatives orframe data primatives, or both. The module may be configured to receiveand process formatted image data from the memory and/or to transmitprocessed image data to a transcode module and communicate compressedimage or video data to or from the transcode module, or combinationsthereof. The module may also be configured to output selected subsets ofthe image data or video data, or both, to a display and/or to processimage data within a single clock cycle with the hardware based imageprocessing module. The module may be programmable and/or reconfigurable.

A dynamically reconfigurable heterogeneous systolic array is configuredin accordance with certain embodiments to process a first image frame;generate a plurality of image processing primatives from said imageframe; identify the location of at least one subregion of said imageframe and generate at least one image processing primative from saidsubregion; and store said primatives along with the image frame andsubregion location in a memory store. The dynamically reconfigurableheterogeneous systolic array is reconfigurable to process a followingimage frame based on one or more determined characteristics of the imageframe or subregion, or both.

The determining of the at least one characteristic may be performedexternal to the systolic array. The determining of the at least onecharacteristic may be based on processing at least a portion of theimage frame and at least one of the image processing primatives.

An image acquisition and processing device in accordance with certainembodiments includes a processor; a lens and image sensor for acquiringdigital image frames; and a dynamically reconfigurable heterogeneoussystolic array configured in accordance with any of the embodimentsdescribed herein.

A method of image processing using a dynamically reconfigurableheterogeneous systolic array is also provided including acquiring andprocessing a first image frame; generating a plurality of imageprocessing primatives from said image frame; identifying the location ofat least one subregion of said image frame and generate at least oneimage processing primative from said subregion; storing said primatives,along with the image frame and subregion location in a memory store,determining at least one characteristic of the image frame or subregion,or both; and based on said characteristic, reconfiguring said array toprocess in a following image frame.

The determining of the at least one characteristic may be performed inpart external to the dynamically reconfigurable heterogeneous systolicarray. The determining of the at least one characteristic may be basedon processing at least a portion of the image frame and at least one ofsaid image processing primatives.

Systolic Arrays

The systolic array paradigm, i.e., data stream-driven by data counters,is the counterpart of the von Neumann paradigm, i.e.,instruction-stream-driven by a program counter. Because a systolic arrayusually sends and receives multiple data streams, and multiple datacounters are used to generate these data streams, it supports dataparallelism. The name derives from analogy with the 30 regular pumpingof blood by the heart.

A systolic array is composed of matrix-like rows of data processingunits called cells. Data processing units, or DPUs, are similar tocentral processing units (CPU)s, except for the lack of a programcounter, since operation is transport-triggered, i.e., by the arrival ofa data object. Each cell shares the information with its neighboursimmediately after processing. The systolic array is often rectangular orotherwise has its cells arranged in columns and/or rows where data flowsacross the array between neighbour DPUs, often with different dataflowing in different directions. FIG. 3 illustrates such an example of ahomogeneous systolic array architecture. The data streams entering andleaving the ports of the array are generated by auto-sequencing memoryunits, or ASMs. Each ASM includes a data counter. In embedded systems, adata stream may also be input from and/or output to an external source.

Systolic arrays may include arrays of DPUs which are connected to asmall number of nearest neighbour DPUs in a mesh-like topology. DPUsperform a sequence of operations on data that flows between them.Because traditional systolic array synthesis methods have been practicedby algebraic algorithms, only uniform arrays with only linear pipes canbe obtained, so that the architectures are the same in all DPUs. Aconsequence is that only applications with regular data dependencies aregenerally implemented on classical systolic arrays.

Like SIMD (single instruction/multiple data) machines, clocked systolicarrays compute in “lock-step,” with each processor undertaking alternatecompute/communicate phases. However, systolic arrays with asynchronoushandshaking between OPUs are often called wavefront arrays. Onewell-known systolic array is Carnegie Mellon University's iWarpprocessor, which has been manufactured by Intel. An iWarp system has alinear array processor connected by data buses going in both directions.

AHIP (Advanced Hardware for Image Processing)

FIG. 4 schematically illustrates an embodiment that includes IPPhardware with AHIP (advanced hardware for image processing). The AHIPillustrated at FIG. 4 has a pass-through nature. FIG. 4 shows a memorystore 44 including am image and data cache 46 and long-term data store48. The cache 46 includes raw data 50, RGB data 52 and processed RGBdata 54, and the long term data store may include MPEG images 56 and/orJPEG images 58. The FIG. 4 also shows a sensor 72 that communicates rawdata to the memory 44 and to an IPP 74.

The IPP 74 also receives raw data from the memory 44. The IPP 74provides RGB data 52 to the memory 44,46. RGB data is provided to anadvantageous AHIP module 75 by the IPP 74. The AHIP module 75 providesprocessed image maps, AHIP module data and RGB data to the memory 44,46.The memory 44,46 provides RGB data, image maps and AHIP module data tothe CPU/GPU 76. The CPU 76 provides processed RGB data 54 to the memory44 and to a transcode module 78. The CPU 76 can also program the IPPmodule 74 and/or the AHIP module 75, as schematically illustrated atFIG. 4. The transcode module 78 provides data to and retrieves data fromthe memory 44,48. The transcode module 78 also provides data to be shownon, e.g., a LCDITFT display 80.

Advantageously, in certain embodiments one standard image pixel is takenper clock cycle and this pixel is processed in one or more of a varietyof ways. Several different types of output may be generated in parallelfrom the processing of each individual pixel. More than one instance ofeach output type can be provided by duplication of hardware elements.Because this hardware sub-system can process a pixel on every clockcycle, it does not delay the transfer of image pixels from the sensorand thus it can be inserted at any stage of the IPP.

A number of generic types of image processing primatives can beidentified and are generated by the AHIP module. To clarify thefollowing discussion, image data may be referred to as “pixels” (pictureelements) and data values in an output primative may be referred to as“map-pixels”. Typically a map-pixel will be significantly smaller than apixel (24 or 32 bits). As examples, one form of map-pixel used for skinmaps has only two-bits corresponding to 4 probabilities of the originalpixel being a skin pixel. Another map-pixel has 4 bits corresponding to16 thresholds describing how similar it is to a particular color in apredefined color space. The color-space thresholds corresponding tothese 16 levels of similarity are stored in a LUT with the final outputdata primative map comprising map-pixels.

The first such primative includes a direct pixel to map-pixel mapping.In certain embodiments, this may include a color or luminancethresholding which determines how close a particular pixel is to apredetermined value in the color space. In certain embodiments, thisdata may be captured as a range of 15 thresholds and written into a4-bit image map. These thresholds can be adjusted from image frame toimage frame by the CPU/GPU.

In an exemplary embodiment, the data values of each threshold are set tomeasure how close image pixels are to a skin color. Such an image mapcan be, advantageously used to differentiate different skin areas of afacial region and can be useful for applications such as face trackingand facial beautification.

This form of image primative only incurs a small fixed delay in terms ofclock cycles. The output image map is typically available within a fewtens of clock cycles after the last pixel of an image frames is input tothe AHIP.

One variant on the pixel to map-pixel processing is when multiple pixelsare processed, generating a single output pixel. This corresponds to asubsampling of the RAW input image. In some embodiments, a block ofpixels is stored temporarily in hardware line buffers before beingprocessed to generate the corresponding output primative data. Inalternative embodiments, pixels continue to be processed individuallybut the outputs from processing each individual pixel are combined insome predetermined way to generate a single map-pixel.

A second form of primative is a kernel derived primative. The map-pixelsfor such primatives are derived from knowledge of the current imagepixel and at least one previous pixel of the image. Many techniquesspecify N×N kernels, which implies that the output value correspondingto the current pixel is determined from N adjacent pixels in bothhorizontal and vertical directions within the image. As pixel data istypically only available to the AHIP module sequentially in certainembodiments, it will be clocked directly in those embodiments, row byrow (and/or column by column), from the image sensor through the IPP.Full rows of image data would typically be buffered in these embodimentsto support kernel derived primatives.

In certain embodiments, seven (7) rows of image data are stored in theirentirety and an 8th image row is rolled over. This enables the module togenerate image processing primatives derived from up to an 8×8 kernel.In this embodiment, there is a delay of the order of 8 times the rowsize of the image (8×1920 for I080p) before a full kernel primativeimage map is available for the current image frame. Nevertheless this isstill less than 1% of the total time taken to acquire the full imageframe (1000 pixel rows) so that the image frame primative data isavailable very shortly after the final frame acquisition is completed.

One particular example of a kernel derived image processing primative isthat of red-eye segmentation. In U.S. Pat. No. 6,873,743, for example,which is incorporated by reference, a technique for performing a 2×2segmentation on an image is described. This operates on LAB color space.Another example of a kernel derived primative is the calculation of theintegral image which is performed on the luminance component of animage. As will be explained shortly, the AHIP incorporates in certainembodiments a color-space transform module which enables on-the-flyconvension of input pixel data between several commonly used colorspaces. Thus individual ROB pixel data can be converted to yee or Labcolor space with negligible delay within the AHIP.

A third form of primative includes frame derived primatives. These areexamples of data primatives where a single pixel or a block of pixels donot generate a corresponding single map-pixel output. One example ofthis form of image processing primative is a histogram module which ispreconfigured with a number of bins. Input pixel data is analyzed for aset of thresholds and classified into a histogram bin based on itsvalue. At the end of an image frame each histogram bin contains a countof the number of pixels which satisfy its upper and lower thresholdlimits.

When combined with the example given for primative type one, it ispossible to measure how many pixels in a particular image frame fellwithin a set of 16 skin-color histogram bins. This, in turn, may suggestthat skin color thresholds need to be adjusted for the next image frameif, for example, too many, or too few, skin pixels were detected withina tracked face region. The hardware architecture within the AHIP isdesigned to enable processing blocks to be dynamically reconfiguredprior to processing an image frame. Additional parallel processingblocks can be added to such a hardware architecture in accordance withcertain embodiments.

In other embodiments, a color correlogram or other forms ofhistogram-like data may be determined by the AHIP. Such primatives mayadvantageously be determined from the same set of row buffers used togenerate kernel data, the difference being that histogram or correlogramdata provides frame derived primatives determined from multiple pixelsrather than the one-to-one mapping of input to output pixels provided bykernel derived primatives.

Another form of frame derived primative includes one that performs asummation of pixel row and pixel column values. This enables acorrelation of the current image frame with one or more previous imageframes. Such primatives introduce another aspect of the AHIP where oneor more of the primatives determined from the current frame may beremembered for one or more subsequent image frames. Such primative maybe significantly smaller than the full image frame or the advantages ofreal-time processing will not be fully realized. In certain embodiments,a typical size limit for such primatives is that they are no larger thanthe maximum row size of the processed image.

In certain embodiments, such data may be retained within the ‘AHIPmodule rather than being written to the external memory store. Thusimage frame derived data and/or pixel derived data may be accumulatedwithin the AHIP to facilitate faster processing of image frameprimatives.

A fourth form of primative is derived from a specific spatial region ofthe main image frame. These primatives may be more complex in nature andmay combine more complex hardware processing of a region with, some baseprimatives and external data derived from the CPU/GPU and relating toone or more preceding image frames.

Hardware buffers may be used in processed that involve predictinglocations of face regions in digital images (see, e.g., U.S. Pat. No.7,315,631 and its progeny, and U.S. Pat. No. 7,466,866, e.g.,incorporated by reference above). In certain embodiments, knowledge isgathered in one or more previous frames as to where one can expect aface to be detected within the current image frame. This approach hasthe advantage of being faster than performing face detection in thecurrent frame, and the gathered information may be used for variouspurposes even before the current image frame.

In particular, it is generally difficult to determine a highly preciselocation of such a region during a first processing of an image framebecause this depends on additional image processing to be performed insoftware on the GPU/CPU. As a consequence, it is generally onlydetermined approximately where a spatial region is during a firsthardware processing of an image frame by AHIP. However these approximatelocations can be advantageously marked and are typically significantlysmaller than the main image. In one embodiment, several such predictedregions may be stored within buffers of the AHIP for further processingon the next frame cycle. In an alternative embodiment, these are writtento memory with the main image, but are loaded back through a second AHIPmodule configured especially to process such regions. In this secondembodiment, advantage is taken of the fact that the memory subsystem isdual-ported. Thus when the next image frame is being processed by theprimary AHIP and written to memory, the one or more predicted regionsfrom the previous image frame may be read back to the second AHIP modulefor more specialized processing. In this embodiment, specific imageregions would be processed typically only while the next image frame isbeing generically processed. Nevertheless a single frame delay can beeasily compensated for and does not compromise the goal of achievingclose to real-time processing of a video sequence.

One very common spatial region is a predicted face region. This is aregion of the current image frame within which it is highly probablythat a face region will be located. Such regions are frequently used inface tracking algorithms (again see U.S. Pat. No. 7,315,631 and itsprogeny, incorporated by reference above). One common use of suchregions is to restrict the application of an image processing algorithm,such as red-eye detection, to an image region where there is highprobability that a face will occur.

FIG. 5 schematically illustrates an AHIP module with several genericprocessing modules arranged into processing chains for various imageprocessing primatives. An image sensor 82, SDRAM memory 84, the AHIPmodule 85 itself, and a CPU 86 are shown in FIG. 5. The AHIP moduleincludes an AHIP configuration manager 90 that communicates with the CPU86. The AHIP module 85 also includes a look-up table (LUT) module 92, adata configuration module 94, a logic module 96, and a synch module 98.As previously illustrated in FIG. 4, certain RGB data 102 is storedstraight away into the memory 84. However, certain other RGB data 103 isprocessed by the AHIP module 85 at one or more pixel processing modules106, one or more frame processing modules 107, one or more regionprocessing modules 108 and one or more kernel processing modules 110.Certain RGB data 103 may be processed at a frame processing module, andthat frame data 112 is stored in the memory 84. Certain RGB data 103 maybe processed at one or more pixel processing modules 106, and theneither pixel data 114 is 10 stored in the memory 84 or that data isfurther processed at a kernel processing module 110 and/or at a frameprocessing module 107 and/or a region processing module 108. RGB data116 of an adjacent frame, such as a previous frame (indicated by N-I),may be processed together with the RGB data″ 103 at a region processingmodule 108. Data processed at the region processing module 108 may thenbe stored in memory 84 as region data 118.

It may be often desirable to apply an algorithm to portions of an imagebefore, e.g., it is possible to even make a full confirmation of theprecise location of a face region. These predicted face regions can bedetermined from previous image frames and can take advantage of ahistory of face and camera movement over a number of preceding frames.In this regard, the frame-to-frame dX and dY displacements may bedetermined by the AHIP module and may be available within a short delayafter the last pixel of a frame is processed. Similarly, the locationand size of a face region may be accurately known for the last number ofone or more frames of a face tracker algorithm and both of these datacan be available very soon after the processing of a new image frame isstarted by AHIP. These data can advantageously enable an accurate anddynamic estimation of predicted face regions for the current image frame(see US published applications nos. US200910303342 and US2009/0263022,and U.S. Pat. Nos. 7,460,695, 7,403,643 and 7,315,631, as well asUS2009/00123063, US2009/0190803, US2009/0189998, US20090 179999 and U.S.Ser. Nos. 12/572,930, 12/824,214 and 12/815,396, which are all assignedto the same assignee and hereby incorporated by reference.

Face regions or face feature regions such as eye regions and/or mouthregions and/or half face regions (see U.S. Ser. Nos. 1211790,594 and12/825,280, which are assigned to the same assignee as the presentapplication and hereby incorporated by reference) can be initially savedto a local memory buffer. Typically, as memory is expensive in ahardware core, there may be a limited number of “face buffers,” and theymay be optimized to minimize size. Indeed in some embodiments suchbuffers may be external to the AHIP and may involve bandwidth to writethese regions to the main external memory store. In other embodiments,their locations within the main image may be recorded, such that theymay be later accessed within the main image frame data from the mainmemory store. In one embodiment, such regions will be stored internallywithin the AHIP module. However as the size of AHIP buffer mayor may notbe too large for full HD video, in an alternative embodiment, these faceregions are stored with the main image in main memory and reloaded intoa secondary AHIP module while the next main image frame is beingprocessed and written to main memory.

In these embodiments, as many face detection and recognition techniquescan be applied to a fixed size face image, these memory buffers may belarge enough to accommodate such a fixed size face region with someadditional capacity. The additional capacity may be used to compensatefor a number of factors: (i) the image region is predicted and thus thehighly precise location of a face is not known at the time of initialAHIP processing of the image; (ii) the face region may not be correctlyupright and may need to be rotated in-plane to obtain a properly uprightface region; this information may be made available later after theprecise location of the face regions is determined and the location ofthe eye-regions within this face region are determined; at such time, anin-plane correction angle and re-sizing information can be provided tothe AHIP, but until such time the face region may be at an in-planeangle involving additional memory space to accommodate diagonallyoriented face regions; (iii) the face region may change in global sizefrom frame to frame; while some trends may be known from the history ofpast image frames, it is also possible that there may have been a changefrom these trends. Thus the precise size of a face mayor may not beknown until the regional data is processed by CPU1GPU after the mainimage frame processing is completed. In an alternative embodiment whereface regions are loaded and processed while a next RAW image frame isprocessed and written, some additional processing of the main image andits associated primatives can be available from the CPU/GPU. In suchcases a more precise estimate of the location of the face region may beavailable.

In another exemplary embodiment, it is assumed that a final fixed-sizeface region of maximum size 32×32 is involved. A memory buffer may beused for the face region of 64×64 to accommodate potential fractionalresizing up to 1.99 and rotation of up to 45 degrees. This defines thebuffer size for the precisely located face region, however when thecurrent frame is first processed by AHIP we only know the predictedregion of that image frame where a face is likely to exist. Accordingly,it may be useful to store a larger region, e.g., a buffer of 96×96 or128×128 pixels may be used providing for predicted face region bufferswhich are 50% or 100% larger than the dimensions of the expected faceregion.

When data about the location and size of predicted face regions ispassed from software processing of the previous image frame to the AHIPthese predicted face regions are integer downsized into respective AHIPbuffers of 96×96 or 128×128. Typically for processing of face regions itis only necessary to retain luminance values and thus these buffers aresingle-valued and full color data is typically not retained. However insome embodiments a form of skin-map may be generated and stored with theluminance image of the face region. Typically a skin map for facetracking purposes may be only 2-bits, representing only 3 or 4 states ofskin-pixel.

An example of such a primative is the application of face classifiers toa tracked face region. Typically the face region will not be exactlyhorizontal and so data from the predicted face region may be firstrotated into a horizontal orientation prior to applying the faceclassifiers. Such manipulations require larger buffers for image dataand so the number of regional buffers and the size of each may besomewhat limited. The primatives output in this example are a set ofdata values corresponding to a feature vector set which can be used toconfirm that and/or (i) this region still contains a verified face; (ii)the feature vector set can be matched against a set of known faces toindicate a likely match.

FIG. 6 illustrates inter-relationships between a main CPU, an AHIPhardware module, an image sensor and read/write channels 125 of SDRAMmemory 126. The AHIP/CPU interface 128 provides certain registers and/orinterrupts. A B2Y interface 130 for YUV data is shown in FIG. 6 betweenthe sensor 124 and the AHIP block 122.

Several principle types of image frame processing are implemented withinthe AHIP. Certain of these are now described below.

Frame-to-Frame AHIP Processing

FIG. 7 illustrates use of data from frame N−1 combined with a currentframe N. The memory store (image data and cache) 140 is segmentedaccordingly to the two frames in the illustrative example of FIG. 7.Some of the features have been described with reference to FIG. 4 andnot repeated here. With respect to face tracking as an example,predicted regions 142 are stored in memory 140 for both the frame N-Iand frame N. Within the frame N-I portion of the memory 140, refinedregions and regional primitive data 143 are stored. These are used togenerate RGB image frame data and frame primatives 145 for frame N.Frame N-I regional data 144 is input to the AHIP block 148 along withframe N RGB data 149. The AHIP block, subsamples and/or color spacere-maps certain of the frame N-I regional data at block ISO, and thenmove all of the frame N−1 regional data 144 through a pixel processingblock 154 to be output to memory 140 at the frame N portion. The frame NRGB data 149 may be sub-sampled 156 and/or color-space re-mapped 158 orneither prior to being pixel processed at block 160. The pixelprocessing 160 may include cumulative, direct, kernel, region and/orhybrid pixel processing. The data is then stored at the frame N portionof the memory 140.

Descriptions of certain of the various sub-elements of the AHIP Modulein accordance with certain embodiments are now briefly described. TheAHIP module may include any, all or none of these example sub-elements.

The Color Map Unit

The color map unit module produces a map of pixels having similarcolors. A reference point R may be pre-programmed in an input colorspace (e.g., RGB). Then, for each input pixel P, the ColorMap module maycompute an Euclidian distance d between P and R, and then compare it toprogrammable thresholds (TI . . . 15). These may be evaluated in acascade, so they may advantageously be disposed in increasing order(TI<T2 . . . <TIS—from very strict to very permissive). If d<T_(n) thenthe module outputs the value 16−n. If no threshold is matched then 0 issent out. FIG. 8 illustrates a color thresholding module in accordancewith certain embodiments.

The output of the colormap may be typically represented by 4 bits. Insome embodiments a smaller set of thresholds may be implemented, e.g.,using only 8 (3-bit), or 4 (2-bit), or 2 (1-bit). The ColorMap modulemay operate on an input full resolution image. The output map may havethe same resolution as the input and each pixel may generate a matchingoutput that indicates how closely it is matched to the pre-programmedreference pixel in a 3D color space.

In alternative embodiments, additional parallel color modules may enablea determination of how closely a pixel is matched to 2 of 3 colorchannels by setting one of the input channels to a zero or mid-pointvalue. Other embodiments implement a 20 color module where only 2 of 3channels are used for comparing.

As the functionality of the ColorMap module depends on the Euclidiandistance, the input color, space should be one where this distance ismeaningful (proximity is equivalent to visual similarity). Normally, themodule should be used with ROB input, but it is not limited to that. InFIG. 9, an example is provided of a ColorMap module being applied to animage. The reference point in this example is a skin-like color given inROB coordinates. On the left in FIG. 9 there is shown the input imageand on the right is the colormap (white represents the maximum value,the range is 0 to 15). In the example of FIG. 9, 16 thresholds (4 bit)were used.

Color Space Conversion (CSC) Unit

A CSC may be used to implements a programmable matrix multiplier definedas follows:

${{Ax} + B} = {{{\begin{bmatrix}{a\; 11} & {a\; 12} & {a\; 13} \\{a\; 21} & {a\; 22} & {a\; 23} \\{a\; 31} & {a\; 32} & {a\; 33} \\{a\; 41} & {a\; 42} & {a\; 43}\end{bmatrix} \times \begin{bmatrix}Y \\U \\V\end{bmatrix}} + \begin{bmatrix}{b\; 1} \\{b\; 2} \\{b\; 3} \\{b\; 4}\end{bmatrix}} = \begin{bmatrix}{{a\; 11\; Y} + {a\; 12\; U} + {a\; 13\; V} + {b\; 1}} \\{{a\; 21\; Y} + {a\; 22\; U} + {a\; 23\; V} + {b\; 2}} \\{{a\; 31\; Y} + {a\; 32\; U} + {a\; 33\; V} + {b\; 3}} \\{{a\; 41\; Y} + {a\; 42\; U} + {a\; 43\; V} + {b\; 4}}\end{bmatrix}}$

x=[Y, U, V] input pixel, A and B=matrices with programmablecoefficients.

This structure can perform conversions like YUV-2-RGB or RGB-2-YUv. Itcan also perform conversions between YUV and custom color spaces. Theseconversions may be used in conjunction with other modules in order toidentify pixels with special properties.

A saturation function may be implemented at the output of the CSC (tolimit values to the integer range of 0 . . . 255).

FIG. 10 illustrates an AHIP module 170 that receives RGB data from animage sensor 172 and provides data to a memory 174, including pixel dataand/or frame data. The example AHIP module 170 of FIG. 10 includes pixelprocessing modules 176. Two of the modules 176 feed thresholding modules178 and two of the modules 176 feed counting modules, one histogram 180and the other registration 182. A logic module 184 receives input fromthe thresholding modules 178 and outputs pixel data to memory 174. A bincounter 186 receives input from the counting (histogram) module 180 andoutputs frame data to memory 174. A dX, dY offset block 188 receivesinput from the counting (registration) module 182 and outputs frame datato the memory 174. A first RGB-2-YUV block 190 outputs to an accumulator(integral image) module 192 which in turn outputs pixel data to memory174. A second RGB-2-YUV block 190 outputs to a squaring module 194 andto an accumulator (integral squared image) module 196 which in turnoutputs pixel data to memory 174.

FIG. 11 illustrates an AHIP block 200 communicating with a CPU 202including providing interrupts 204 to the CPU 202 and receivingconfiguration commands 206 from the CPU 202. The AHIP block 200 receivesdata from a sensor/B2Y 208 via a B2Y block 210 and outputs to memory212.

Thresholding (Thr)

This module contains four 8×1 LUTs for thresholding the outputs of othermodules and converting them to binary values (I-bit). Multiplethresholding units can be incorporated within) 0 the AHIP.

Logic Function (LF)

This module contains a 6×6 LUT which can be used to further combine theresults of the THR or other AHIP module such as the ColorMap module. Itcan implement 6 different logic functions with the inputs from the THRand SKIN modules. Multiple logic units can be incorporated within theAHIP.

Histograms (Hist)

Computes histograms may be applied to the original input data (YUV)and/or to the output of the CSC module. These modules accumulate a countvalue from each pixel and their output is available after each pixel ofthe image frame has been clocked through the AHIP. The number ofhistogram bins is typically 16 and the thresholds for each bin areprogrammable.

Integral Images Accumulators (II)—AKA Area Computation Modules

These modules contain blocks for computing integral images (allow forfast area computations in rectangular regions; see, e.g., U.S. Pat. No.7,315,631, incorporated by reference above, and US 2002/0102024,incorporated by reference). They are usually employed in real-timeobject detection algorithms. There may be any of three standard blocksavailable:

II (summed area for original Y channel, used for area computation)II2 (sum over Y square, used for variance computation)Skin II (integral image on the skin map, gives the skin density in anarea).

These modules may be used to accumulate their values on each input pixeland output the current accumulated value to provide the correspondingpixel value of the Integral Image, or Integral Variance or Skin IntegralImage maps (see, e.g., US published application no. 201010053368,incorporated by reference).

Complex Downsampler (OS)

This downsampler scales the image for the other modules. The module wasdesigned to provide a reasonable level of quality while having a lowgate-count. It can be programmed to achieve variable downsizing of themain image.

Fixed Downsamplers (XDS)

A simpler downsampler implementation is also available using nearestneighbor interpollation. These allow simultaneous computations of mapsat a fixed resolution. Typically ×2, ×4 and ×8 downsamplers will beavailable.

Workflow

AHIP may be designed to be integrated at the end of the image pipeline,after a De-Bayer module (sensor interface). While YUV is considered tobe the native color space of AHIP, some internal modules can handleother color spaces as well. In addition to the sensor interface, AHIPprovides a playback interface for reading data from memory. In previewmode, AHIP may be configured to handle data at the sensor clock rate(one clock per pixel). Data processed by AHIP may be written to a mainmemory of a target system (SDRAM). One or more of the modules may outputvalues into registers. AHIP may or may not be configured to provide astreaming output interface to other modules.

An example of a typical workflow with AHIP is the following:

-   -   Live data from the sensor or from memory is sent to AHIP (one        pixel at a time).    -   During the period of a frame, AHIP performs its computations:    -   One or more modules update internal statistics.

0 One or more other modules write data (maps) to the system memory. Thismay be typically done with minimal delay with respect to the input(i.e., data goes out as pixels come in).

-   -   After the end of a frame, AHIP triggers an interrupt to the CPU:    -   In normal conditions, the interrupt notifies the CPU that fresh        data is available.    -   There may also be interrupts signalling error conditions.    -   The CPU interrupt handler is called:    -   It reads AHIP registers to find the reason of the interrupt.    -   It then optionally reconfigures AHIP (e.g., based on a        determination that it would be advantageous to do so), and        acknowledges the interrupt.    -   It also signals the other CPU threads that data from the        hardware is available (and IS algorithms start using it).

While exemplary drawings and specific embodiments of the presentinvention have been described and illustrated, it is to be understoodthat that the scope of the present invention is not to be limited to theparticular embodiments discussed. Thus, the embodiments shall beregarded as illustrative rather than restrictive, and it should beunderstood that variations may be made in those embodiments by workersskilled in the arts without departing from the scope of the presentinvention.

In addition, in methods that may be performed according to preferredembodiments herein and that may have been described above, theoperations have been described in selected typographical sequences.However, the sequences have been selected and so ordered fortypographical convenience and are not intended to imply any particularorder for performing the operations, except for those where a particularorder may be expressly set forth or where those of ordinary skill in theart may deem a particular order to be necessary.

In addition, all references cited above and below herein, as well as thebackground, invention summary, abstract and brief description of thedrawings, are all incorporated by reference into the detaileddescription of the preferred embodiments as disclosing alternativeembodiments. The following are incorporated by reference: U.S. Pat. Nos.7,715,597, 7,702,136, 7,692,696, 7,684,630, 7,680,342, 7,676,108, 7,634,I 09, 7,630,527, 7,620,218, 7,606,417, 7,587,068, 7,403,643, 7,352,394,6,407,777, 7,269,292, 7,308,156, 7,315,631, 7,336,821, 7,295,233,6,571,003, 7,212,657, 7,039,222, 7,082,211, 7,184,578, 7,187,788,6,639,685, 6,628,842, 6,256,058, 5,579,063, 6,480,300, 5,781,650,7,362,368, 7,551,755, 7,515,740, 7,469,071 and 5,978,519; and

U.S. published application nos. 2005/0041121, 2007/0110305,200610204110, PCTIUS20061021393, 200510068452, 200610120599,200610098890, 200610140455, 2006/0285754, 2008/0031498, 2007/0147820,2007/0189748, 2008/0037840, 2007/0269108, 2007/0201724, 2002/0081003,2003/0198384, 200610276698, 2004/0080631, 2008/0106615, 2006/0077261,2007/0071347, 20060228040, 20060228039, 20060228038, 20060228037,20060153470, 20040170337, 20030223622, 20090273685, 20080240555,20080232711, 20090263022, 20080013798, 20070296833, 20080219517,20080219518, 20080292193, 20080175481, 20080220750, 20080219581,20080112599, 20080317379, 20080205712, 20090080797, 20090196466,20090080713, 20090303343, 20090303342, 20090189998, 20090179998,20090189998, 20090189997, 20090190803, 20100141787, 20100165150,20100066822, 20100053368, and 20090179999; and

U.S. patent application Nos. 61/361,868, 611311,264, 60/829,127,601914,962, 611019,370, 61/023,855, 611221,467, 61/221,425, 611221,417,Ser. No. 121748,418, 611182,625, 611221,455, Ser. Nos. 12/479,658,12/063,089, 61/091,700, 611120,289, Ser. Nos. 12/827,868, 12/824,204,12/820,002, 121784,418, 121710,271, 12/636,647, 12/572,930, and12/479,593.

What is claimed is:
 1. An image processing hardware unit comprising: aframe receiving component configured to: receive frame data of a firstimage frame; a frame processing component configured to: generate, basedon the frame data of the first image frame, an image processingprimitive; wherein the image processing primitive is generated byidentifying a sub-region of the first image frame and generating theimage processing primitive from data of the sub-region; wherein theimage processing primitive comprises an image map; an integral imageaccumulator configured to: generate, based on the image map, an integralimage (II) primitive by computing for a first pixel of the II primitivea sum of luminance values of all pixels located above and to the leftfrom the first pixel in the image map; and store the II primitive andthe first image frame in a memory store.
 2. The image processinghardware unit of claim 1, wherein the integral image accumulator isfurther configured to: generate, based on the II primitive, an integralimage square (II2) primitive as a product of the II primitive multipliedby the II primitive; and store the II2 primitive in the memory store. 3.The image processing hardware unit of claim 2, wherein the integralimage accumulator is further configured to: for each pixel of thesub-region, determine a similarity score indicating a similarity betweena color of the pixel and a skin color; determine a skin II primitivebased on the similarity scores determined for the sub-region; and storethe skin II primitive in the memory store.
 4. The image processinghardware unit of claim 3, wherein the II primitive is used to match oneor more Haar classifiers to an image luminance value to determinewhether a face is depicted in the sub-region; wherein the II2 primitiveis used to determine a variance in color values within the sub-region;and wherein the skin II primitive is used to determine a probabilitythat a skin region is depicted in the sub-region.
 5. The imageprocessing hardware unit of claim 4, wherein the frame processingcomponent is further configured to: determine whether the variance islow, and if so, determine that the sub-region unlikely depicts the face;and determine whether the variance is high, and if so, determine thatthe sub-region probably depicts the face.
 6. The image processinghardware unit of claim 5, wherein the first image frame is a videoframe.
 7. The image processing hardware unit of claim 6, wherein theframe processing component is further configured to: generate, based onthe frame data of the first image frame, a skin-map primitive; whereinthe skin-map primitive is a bit-map; and store the skin-map primitive inthe memory store.
 8. An image processing method comprising: receivingframe data of a first image frame; generating, based on the frame dataof the first image frame, an image processing primitive; wherein theimage processing primitive is generated by identifying a sub-region ofthe first image frame and generating the image processing primitive fromdata of the sub-region; wherein the image processing primitive comprisesan image map; generating, based on the image map, an integral image (II)primitive by computing for a first pixel of the II primitive a sum ofluminance values of all pixels located above and to the left from thefirst pixel in the image map; and storing the II primitive and the firstimage frame in a memory store; wherein the method is performed using oneor more computing devices.
 9. The image processing method of claim 8,further comprising: generating, based on the II primitive, an integralimage square (II2) primitive as a product of the II primitive multipliedby the II primitive; and storing the II2 primitive in the memory store.10. The image processing method of claim 9, further comprising: for eachpixel of the sub-region, determining a similarity score indicating asimilarity between a color of the pixel and a skin color; determining askin II primitive based on the similarity scores determined for thesub-region; and storing the skin II primitive in the memory store. 11.The image processing method of claim 10, wherein the II primitive isused to match one or more Haar classifiers to an image luminance valueto determine whether a face is depicted in the sub-region; wherein theII2 primitive is used to determine a variance in color values within thesub-region; and wherein the skin II primitive is used to determine aprobability that a skin region is depicted in the sub-region.
 12. Theimage processing method of claim 11, further comprising: determiningwhether the variance is low, and if so, determine that the sub-regionunlikely depicts the face; and determining whether the variance is high,and if so, determine that the sub-region probably depicts the face. 13.The image processing method of claim 12, wherein the first image frameis a video frame.
 14. The image processing method of claim 13, furthercomprising: generating, based on the frame data of the first imageframe, a skin-map primitive; wherein the skin-map primitive is abit-map; and storing the skin-map primitive in the memory store.
 15. Anon-transitory computer-readable storage medium storing one or morecomputer instructions which, when executed, cause one or more processorsto perform: receiving frame data of a first image frame; generating,based on the frame data of the first image frame, an image processingprimitive; wherein the image processing primitive is generated byidentifying a sub-region of the first image frame and generating theimage processing primitive from data of the sub-region; wherein theimage processing primitive comprises an image map; generating, based onthe image map, an integral image (II) primitive by computing for a firstpixel of the II primitive a sum of luminance values of all pixelslocated above and to the left from the first pixel in the image map; andstoring the II primitive and the first image frame in a memory store.16. The non-transitory computer-readable storage medium of claim 15,further comprising additional instructions which, when executed, causethe one or more processors to perform: generating, based on the IIprimitive, an integral image square (II2) primitive as a product of theII primitive multiplied by the II primitive; and storing the II2primitive in the memory store.
 17. The non-transitory computer-readablestorage medium of claim 16, further comprising additional instructionswhich, when executed, cause the one or more processors to perform: foreach pixel of the sub-region, determining a similarity score indicatinga similarity between a color of the pixel and a skin color; determininga skin II primitive based on the similarity scores determined for thesub-region; and storing the skin II primitive in the memory store. 18.The non-transitory computer-readable storage medium of claim 17, whereinthe II primitive is used to match one or more Haar classifiers to animage luminance value to determine whether a face is depicted in thesub-region; wherein the II2 primitive is used to determine a variance incolor values within the sub-region; and wherein the skin II primitive isused to determine a probability that a skin region is depicted in thesub-region.
 19. The non-transitory computer-readable storage medium ofclaim 18, further comprising additional instructions which, whenexecuted, cause the one or more processors to perform: determiningwhether the variance is low, and if so, determine that the sub-regionunlikely depicts the face; and determining whether the variance is high,and if so, determine that the sub-region probably depicts the face. 20.The non-transitory computer-readable storage medium of claim 19, whereinthe first image frame is a video frame.